%
% Copyright 2014, General Dynamics C4 Systems
%
% SPDX-License-Identifier: GPL-2.0-only
%

\chapter{\label{ch:vspace}Address Spaces and Virtual Memory}

A virtual address space in seL4 is called a VSpace. In a similar
way to a CSpace (see \autoref{ch:cspace}), a VSpace is composed of objects
provided by the microkernel. Unlike CSpaces, these objects for managing
virtual memory largely correspond to those of the hardware. Consequently,
each architecture defines its own objects for the top-level VSpace and further intermediate paging structures.
Common to every architecture is the \obj{Page}, representing a frame of physical memory.
The kernel also includes \obj{ASID Pool} and
\obj{ASID Control} objects for tracking the status of address spaces.

These VSpace-related objects are sufficient to implement the
hardware data structures required to create, manipulate, and destroy
virtual memory address spaces. It should be noted that, as usual, the
manipulator of a virtual memory space needs the appropriate
capabilities to the required objects.

\section{Objects}

\subsection{Hardware Virtual Memory Objects}

Each architecture has a top-level paging structure (level 0) and a number of intermediate levels.
The top-level paging structure corresponds directly to the higher-level concept of a VSpace in seL4.
For each architecture, the VSpace is realised as a different object, as determined by the
architectural details.

In general, each paging structure at each level contains slots where the next level paging structure,
or a specifically sized frame of memory, can be mapped. If the previous level is not mapped,
a mapping operation will fail. Developers need to manually create and map all paging structures.
The size and type of structure at each level, and the number of bits in the virtual address resolved
for that level, is hardware defined.

seL4 provides methods for operating on these
hardware paging structures including mapping and cache operations. Mapping operations are invoked on
the capability being mapped, e.g. to map a level 1 paging structure at a specific virtual address,
the capability to the corresponding object is invoked with a map operation, where the top-level
structure is passed as an argument.

In general, the top-level structure has no invocations for mapping, but is used as an argument to
several other virtual-memory related object invocations.
For some architectures, the top-level page table can be invoked for cache operations.
By making these cache related operations invocations on page directory capabilities in addition to
the page capabilities themselves, the
API allows users more flexible policy options. For example, a process that has delegated a page
directory can conduct cache operations on all frames mapped from that capability without access
to those capabilities directly.

The rest of this section details the paging structures for each architecture.

\subsubsection{IA-32}

On IA-32, the VSpace is realised as a \texttt{PageDirectory}, which covers the entire 4\,GiB range
in the 32-bit address space, and forms the top-level paging structure. Second level page-tables
(\texttt{PageTable} objects) each cover a 4\,MiB range.
Structures at both levels are indexed by 10 bits in the virtual address.

\begin{tabularx}{\textwidth}{Xlll} \toprule
\emph{Object}          & \emph{Address Bits} & \emph{Level} & \emph{Methods} \\ \midrule
\texttt{PageDirectory} & 22---31             & 0            & \autoref{group__x86__seL4__X86__PageDirectory} \\
\texttt{PageTable}     & 12---21             & 1            & \autoref{group__x86__seL4__X86__PageTable} \\
\bottomrule
\end{tabularx}

\subsubsection{x64}

On x86-64, the VSpace is realised as a \texttt{PML4}. Three further levels of paging structure are
defined, as shown in the table below. All structures are indexed with 9 bits of the virtual
address.

\begin{tabularx}{\textwidth}{Xlll} \toprule
\emph{Object}          & \emph{Address Bits} & \emph{Level} & \emph{Methods} \\ \midrule
\texttt{PML4}          & 39---47             & 0            & None \\
\texttt{PDPT}          & 30---38             & 1            & \autoref{group__x86__64__seL4__X86__PDPT} \\
\texttt{PageDirectory} & 21---29             & 2            & \autoref{group__x86__seL4__X86__PageDirectory} \\
\texttt{PageTable}     & 12---20             & 3            & \autoref{group__x86__seL4__X86__PageTable} \\
\bottomrule
\end{tabularx}

\subsubsection{AArch32}

Like IA-32, ARM AArch32 realise the VSpace as a \texttt{PageDirectory}, which covers the entire
4\,GiB address range, and a second-level \texttt{PageTable}. The second-level structures on AArch32
cover 1\,MiB address ranges.

ARM AArch32 processors have a two-level page-table structure.
The top-level page directory covers a range of 4\,GiB and each page table covers a 1\,MiB range.

\begin{tabularx}{\textwidth}{Xlll} \toprule
\emph{Object}          & \emph{Address Bits} & \emph{Level} & \emph{Methods} \\ \midrule
\texttt{PageDirectory} & 20---31             & 0            & \autoref{group__aarch32__seL4__ARM__PageDirectory} \\
\texttt{PageTable}     & 12---19             & 1            & \autoref{group__arm__seL4__ARM__PageTable} \\
\bottomrule
\end{tabularx}

\subsubsection{AArch64}

ARM AArch64 processors have a four-level page-table structure, where the VSpace is realised as a
\texttt{PageGlobalDirectory}. All paging structures are index by 9 bits of the virtual address.

\begin{tabularx}{\textwidth}{Xlll} \toprule
\emph{Object}                    & \emph{Address Bits} & \emph{Level} & \emph{Methods} \\ \midrule
    \texttt{PageGlobalDirectory} & 39---47             & 0            & \autoref{group__aarch64__seL4__ARM__PageGlobalDirectory} \\
    \texttt{PageUpperDirectory}  & 30---38             & 1            & \autoref{group__aarch64__seL4__ARM__PageUpperDirectory} \\
\texttt{PageDirectory}           & 21---29             & 2            & \autoref{group__aarch64__seL4__ARM__PageDirectory} \\
\texttt{PageTable}               & 12---20             & 3            & \autoref{group__arm__seL4__ARM__PageTable} \\
\bottomrule
\end{tabularx}

\subsection{RISC-V}

RISC-V provides the same paging structure for all levels, \texttt{PageTable}. The VSpace is then
realised as a \texttt{PageTable}.

\subsubsection{RISC-V 32-bit}

32-bit RISC-V \texttt{PageTables} are indexed by 10 bits of virtual address.

\begin{tabularx}{\textwidth}{Xlll} \toprule
\emph{Object}          & \emph{Address Bits} & \emph{Level} & \emph{Methods} \\ \midrule
\texttt{PageTable}     & 22---31             & 0            & \autoref{group__riscv__seL4__RISCV__PageTable} \\
\texttt{PageTable}     & 12---21             & 1            & \autoref{group__riscv__seL4__RISCV__PageTable} \\
\bottomrule
\end{tabularx}

\subsubsection{RISC-V 64-bit}

64-bit RISC-V follows the SV39 model, where \texttt{PageTables} are indexed by 9 bits of virtual address.
Although RISC-V allows
for multiple different numbers of paging levels, currently seL4 only supports exactly three levels
of paging structures.

\begin{tabularx}{\textwidth}{Xlll} \toprule
\emph{Object}          & \emph{Address Bits} & \emph{Level} & \emph{Methods} \\ \midrule
\texttt{PageTable}     & 30---38             & 0            & \autoref{group__riscv__seL4__RISCV__PageTable} \\
\texttt{PageTable}     & 21---29             & 1            & \autoref{group__riscv__seL4__RISCV__PageTable} \\
\texttt{PageTable}     & 12---20             & 2            & \autoref{group__riscv__seL4__RISCV__PageTable} \\
\bottomrule
\end{tabularx}

\subsection{Page}

A \obj{Page} object corresponds to a frame of physical memory that is used to
implement virtual memory pages in a virtual address space.

The virtual address for a \obj{Page} mapping
must be aligned to
the size of the \obj{Page} and must be mapped to a suitable VSpace, and every intermediate paging
structure required.
To map a page readable, the capability
to the page
that is being invoked must have read permissions. To map the page
writeable, the capability must have write permissions. The requested
mapping permissions are specified with an argument of type
\texttt{seL4\_CapRights} given to the mapping function.
If the capability does not have
sufficient permissions to authorise the given mapping, then
the mapping permissions are silently downgraded. Specific mapping permissions are dependant on the
architecture and are documented in the \autoref{sec:api_reference} for each function.

At minimum, each architecture defines \texttt{Map}, \texttt{Unmap} and
\texttt{GetAddress} methods for pages.
Methods for page objects for each architecture can be found in the \autoref{sec:api_reference}, and
are indexed per architecture in the table below.

\begin{tabularx}{\textwidth}{Xl} \toprule
\emph{Architectures} & \emph{Methods} \\ \midrule
IA32, X64            & \autoref{group__x86__seL4__X86__Page} \\
AArch32, AArch64     & \autoref{group__arm__seL4__ARM__Page} \\
    RISC-V           & \autoref{group__riscv__seL4__RISCV__Page} \\
\bottomrule
\end{tabularx}

Each architecture also defines a range of page sizes. In the next section we show the available page
sizes, as well as the \emph{mapping level}, which refers to
the level of the paging structure at which this page must be mapped.

\subsubsection{AArch32 page sizes}

\begin{tabularx}{\textwidth}{Xll} \toprule
    \emph{Constant}             & \emph{Size} & \emph{Mapping level} \\ \midrule
    \texttt{seL4\_PageBits}      & 4\,KiB      & 1                   \\
    \texttt{seL4\_LargePageBits} & 64\,KiB     & 1                   \\
    \texttt{seL4\_SectionBits}   & 1\,MiB      & 0                   \\
    \texttt{seL4\_SuperSectionBits} & 16\,MiB  & 0                   \\
    \bottomrule
\end{tabularx}

Mappings for sections and super sections consume 16 slots in the page table and page directory
respectively.

\subsubsection{AArch64 page sizes}

\begin{tabularx}{\textwidth}{Xll} \toprule
    \emph{Constant}             & \emph{Size} & \emph{Mapping level} \\ \midrule
    \texttt{seL4\_PageBits}      & 4\,KiB      & 3                   \\
    \texttt{seL4\_LargePageBits} & 2\,MiB     & 2                    \\
    \texttt{seL4\_HugePageBits}  & 1\,GiB     & 1                    \\
    \bottomrule
\end{tabularx}

\subsubsection{IA-32 page sizes}

\begin{tabularx}{\textwidth}{Xll} \toprule
    \emph{Constant}             & \emph{Size} & \emph{Mapping level} \\ \midrule
    \texttt{seL4\_PageBits}      & 4\,KiB      & 1                   \\
    \texttt{seL4\_LargePageBits} & 4\,MiB      & 0                   \\
    \bottomrule
\end{tabularx}

\subsubsection{X64 page sizes}

\begin{tabularx}{\textwidth}{Xll} \toprule
    \emph{Constant}             & \emph{Size} & \emph{Mapping level} \\ \midrule
    \texttt{seL4\_PageBits}      & 4\,KiB      & 3                   \\
    \texttt{seL4\_LargePageBits} & 2\,MiB      & 2                   \\
    \texttt{seL4\_HugePageBits}  & 1\,GiB      & 1                   \\
    \bottomrule
\end{tabularx}

\subsubsection{RISC-V 32-bit page sizes}

\begin{tabularx}{\textwidth}{Xll} \toprule
    \emph{Constant}             & \emph{Size} & \emph{Mapping level} \\ \midrule
    \texttt{seL4\_PageBits}      & 4\,KiB      & 1                   \\
    \texttt{seL4\_LargePageBits} & 4\,MiB      & 0                   \\
    \bottomrule
\end{tabularx}

\subsubsection{RISC-V 64-bit page sizes}

\begin{tabularx}{\textwidth}{Xll} \toprule
    \emph{Constant}             & \emph{Size} & \emph{Mapping level} \\ \midrule
    \texttt{seL4\_PageBits}      & 4\,KiB      & 2                   \\
    \texttt{seL4\_LargePageBits} & 2\,MiB      & 1                   \\
    \texttt{seL4\_HugePageBits}  & 1\,GiB      & 0                   \\
    \bottomrule
\end{tabularx}

\subsection{ASID Control}

For internal kernel book-keeping purposes, there is a fixed maximum
number of applications the system can support.  In order to manage
this limited resource, the microkernel provides an \obj{ASID Control}
capability. The \obj{ASID Control} capability is used to generate a
capability that authorises the use of a subset of available address-space identifiers.
This newly created capability is called an
\obj{ASID Pool}. \obj{ASID Control} only has a single \texttt{MakePool} method for each
architecture, listed in the table below.

\begin{tabularx}{\textwidth}{Xl} \toprule
\emph{Architectures} & \emph{Methods} \\ \midrule
IA32, X64            & \autoref{group__x86__seL4__X86__ASIDControl} \\
AArch32, AArch64     & \autoref{group__arm__seL4__ARM__ASIDControl} \\
RISC-V               & \autoref{group__riscv__seL4__RISCV__ASIDControl} \\
\bottomrule
\end{tabularx}

\subsection{ASID Pool}

An \obj{ASID Pool} confers the right to create a subset of the available
maximum applications. For a VSpace to be usable by an application, it
must be assigned to an ASID. This is done using a capability to an
\obj{ASID Pool}. The \obj{ASID Pool} object has a single method, \texttt{Assign}, for each
architecture:

\begin{tabularx}{\textwidth}{Xl} \toprule
\emph{Architectures} & \emph{Methods} \\ \midrule
IA32, X64            & \autoref{group__x86__seL4__X86__ASIDPool} \\
AArch32, AArch64     & \autoref{group__arm__seL4__ARM__ASIDPool} \\
RISC-V               & \autoref{group__riscv__seL4__RISCV__ASIDPool} \\
\bottomrule
\end{tabularx}

\section{Mapping Attributes}
A parameter of type \texttt{seL4\_ARM\_VMAttributes} or
\texttt{seL4\_x86\_VMAttributes} is used to specify the cache behaviour of the
page being mapped; possible values for ARM that can be bitwise OR'd together are
shown in \autoref{tbl:vmattr_arm} \ifxeightsix and an enumeration of valid values
for IA-32 are shown in \autoref{tbl:vmattr_ia32}\fi. Mapping attributtes can be updated on existing mappings using the Map invocation with the same virtual address.

\begin{table}[htb]
  \begin{center}
    \begin{tabularx}{\textwidth}{p{0.33\textwidth}X}
      \toprule
      Attribute & Meaning \\
      \midrule
      \texttt{seL4\_ARM\_PageCacheable} & Enable data in this mapping
      to be cached \\
      \texttt{seL4\_ARM\_ParityEnabled} & Enable parity checking for
      this mapping\\
      \texttt{seL4\_ARM\_ExecuteNever} & Map this memory as non-executable \\
      \bottomrule
    \end{tabularx}
    \caption{\label{tbl:vmattr_arm} Virtual memory attributes for ARM page
      table entries.}
  \end{center}
\end{table}

\begin{table}[htb]
  \begin{center}
    \begin{tabularx}{\textwidth}{p{0.33\textwidth}X}
      \toprule
      Attribute & Meaning \\
      \midrule
      \texttt{seL4\_x86\_WriteBack} & Read and writes are cached \\
      \texttt{seL4\_x86\_CacheDisabled} & Prevent data in this mapping
      from being cached \\
      \texttt{seL4\_x86\_WriteThrough} & Enable write through cacheing for this mapping \\
      \texttt{seL4\_x86\_WriteCombining} & Enable write combining for this mapping \\
      \bottomrule
    \end{tabularx}
    \caption{\label{tbl:vmattr_ia32} Virtual memory attributes for x86 page
      table entries.}
  \end{center}
\end{table}

\section{Sharing Memory}

seL4 does not allow \obj{Page Table}s to be shared, but does allow
pages to be shared between address spaces. 
To share a page, the capability to the 
\obj{Page} must first be
duplicated using the \apifunc{seL4\_CNode\_Copy}{cnode_copy} method and the new copy must
be used in the \apifunc{seL4\_ARM\_Page\_Map}{arm_page_map} \ifxeightsix or \apifunc{seL4\_x86\_Page\_Map}{x86_page_map} \fi method that maps the page into the second
address space. Attempting to map the same capability
twice will result in an error. 


\section{Page Faults}

Page faults are reported to the exception handler of the executed thread.
See \autoref{sec:vm-fault}.
